Sampling Reloaded
نویسندگان
چکیده
Sampling methods are integral to the design of surveys and experiments, to the validity of results, and thus to the study of statistics, social science, and a variety other disciplines that use statistical data. Many probabilistic sampling methods have been proposed in the literature for capturing large amounts of data succinctly. Such methods are bound by the space & time constraint and find direct applications in approximate query answering systems, where the main objective is to provide a quick but approximate answer to a user query, providing error guarantees.The main principle behind the design of such systems is that for a very large data sets on which execution of complex queries is time consuming, it is much better to provide an approximate answer.Here we present a comprehensive survey of the various sampling methods and categorize them on the basis of their underlying algorithms. We present a detailed comparative analysis of various techniques and validate our observations through extensive simulations. All major queries on data streams can be categorized into (1) Count Distinct: Estimating the number of distinct value in the data stream. (2) Frequency Estimation: Estimating the approximate frequencies of elements. We study the efficiency of various schemes in answering these fundamental queries. We have analyzed the effects of probability, space and error bounds on the efficiency of various schemes. On the basis of our experimental results, we also present some heuristics for approximating count and count distinct in various scenarios.
منابع مشابه
The Matrix Reloaded: New Insights from Type IV Collagen Derived Endogenous Angiogenesis Inhibitors and their Mechanism of Action
Angiogenesis, the process of neovascularization from parent blood vessels, is a prerequisite for many physiological and pathological conditions that is regulated by a balance between the levels of endogenous angiogenic stimulators and matrix reloaded angiogenic regulators. Several non-collagenous carboxy terminal end domains in chains of type IV collagen matrix reloaded molecules selectively in...
متن کاملMorfeusz Reloaded
The paper presents recent developments in Morfeusz – a morphological analyser for Polish. The program, being already a fundamental resource for processing Polish, has been reimplemented with some important changes in the tagset, some new options, added information on proper names, and ability to perform simple prefix derivation. The present version of Morfeusz (including its dictionaries) is ma...
متن کاملKleinberg's Grid Reloaded
One of the key features of small-worlds is the ability to route messages with few hops only using local knowledge of the topology. In 2000, Kleinberg proposed a model based on an augmented grid that asymptotically exhibits such property. In this paper, we propose to revisit the original model from a simulation-based perspective. Our approach is fueled by a new algorithm that uses dynamic reject...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005